rm(list=ls())
#importing relevant libraries
library(ggplot2)
library(forecast)
library(dynlm)
library(fpp2)
library(MASS)
library(stats)
library(magrittr)
library(tseries)
library(strucchange)
library(vars)
library(quantmod)
library(astsa)
library(ForecastComb)
library(rockchalk)
library(dplyr)
library(stats)
library(rugarch)
library(Hmisc)
library(stats4)

I. (5%) Introduction (describe the data, provide some background on the topic, etc.).

As we look towards the future, laborers in all industries are now beginning to experience and will continue to experience the effects of what we are now coming to know as the fourth industrial revolution. In the wake of this progressive change of the shape and nature of industry, we must be forward thinking in the direction that we are orienting the coming generations of America. For this reason, the utility of forecasting the future of job hires/openings across all industries will continue to become more important to maximize the efficiency of the labor sorting process. 
Keeping this in mind, the data for this particular project is taken from the Bureau of Labor Statistics. Our two data sets describe the monthly number of hires for two industries: (1) Professional and Business Services and (2) Financial Activities. We choose finance and business service industries for the reason that these two sectors of the economy overlap in a variety of manners. If we see an inverse relationship between job growth in one relative to the other this might be an indication that they are targetting the same labor pool. Alternatively, if we see that job growth in one leads to job growth in the other, this might be indicative of a symbiotic relationship between the finance sector and business services sector. Based off of this hiring information, if we were to apply this model over a larger time scale, we might be able to determine the relative job growth in the future for such industries (and hence be able to better orient our training and education at the present).
The monthly data is in units of tens of thousands of hires, is taken from December of 2000 to December of 2019, and was aggregated on the BLS website via the Job Openings and Labor Turnover Survey.

II. (80%) Results (answers and plots).

Data Set 1: Business and Professional Service Hires

(a) Produce a time-series plot of your data including the respective ACF and PACF plots.

#reading in the csv data file for google 
B=read.csv('Business_hires.csv',skip=10,header=T)
B=as.numeric(levels(B[,4])[as.integer(B[,4])])
## Warning: NAs introduced by coercion
B=na.remove(B)

#creating a time series from the adjusted close variable with weekly frequency (Leaving the last year to test our forecast)
Actual_Business = ts(B,start=c(2000,12),frequency=12)
Est_set = Actual_Business[1:217]
Test_set = Actual_Business[218:229]
Forecast_Business = ts(Test_set,start=c(2019),frequency=12)
Business = ts(Est_set,start=c(2000,12),frequency=12)

#plotting the data
autoplot(Business,main="Quarterly Business and Professional Service Hires",ylab="Thousands",xlab="Year")

acf(Business)

pacf(Business)

(b) Fit a model that includes, trend, seasonality and cyclical components. Make sure to discuss your model in detail.

As we now perform model selection we will account for trend, seasonality and cycles in the following manner. We will consider a seasonal AR process of length one to account for seasonality or alternatively use a full set of seasonal dummies. For trend, we will use a polynomial in our dynlm models and use first differencing to account for trend otherwise. Lastly, we will use variations of ARMA to account for stochastic cyclical behavior.

t=time(Business)
tcube = t*t*t

#performing model selection
model1 = dynlm(Business~tcube)
model2 = dynlm(Business~tcube+seasonaldummy(Business))
model3= dynlm(Business~L(Business,12)+tcube)
model4= dynlm(Business~L(Business,2)+tcube+seasonaldummy(Business))
model5= Arima(Business,order=c(1,1,0),xreg=seasonaldummy(Business))
model6= Arima(Business,order=c(0,1,1),xreg=seasonaldummy(Business))
model7= Arima(Business,order=c(0,1,0),xreg=seasonaldummy(Business))

#we use BIC to select which model to continue onward with
AIC(model1,model2,model3,model4,model5,model6,model7)
## Warning in AIC.default(model1, model2, model3, model4, model5, model6, model7):
## models are not all fitted to the same number of observations
BIC(model1,model2,model3,model4,model5,model6,model7)
## Warning in BIC.default(model1, model2, model3, model4, model5, model6, model7):
## models are not all fitted to the same number of observations
autoplot(Business,main="Business Services Hires, Fitted Values : \n First Difference + MA(1) + Seasonal Dummies")+autolayer(model6$fit,series="Fitted Values") + xlab('Year')+ylab('Thousands')+ scale_color_manual(values=c('red'), breaks=c('Fitted Values'))

Based on BIC, we have chosen an ARIMA(0,1,1) model with a full set of seasonal dummies. The seasonal dummies account for the seasonality in the data. As we can see in the plot below, the seasonality is significant but not entirely deterministic and hence our seasonal dummies might be missing some seasonal evolution. The cycles in the data are accounted for by the MA(1) component of the model. This component indicates that cycles (likely caused by fluctuations in the business cycle or industry events) are stochastic with fairly short term memory. Lastly, we use the first difference I(1) to flatten any upward trend in our data and make our data more stationary in order to estimate stochastic processes.

monthplot(Business,ylab="Seasonal Hires (Thousands)",xlab='Month',main="Business Hires by Season")

As we can see the higheest season for hires (on average) is April, while the lowest season for hiring is december. There is much fluctuation within seasons but we are generally seeing that November and December see the lowest highers of any season, which makes sense in terms of typical recruitment and hiring months.

(c) Plot the respective residuals vs. fitted values and discuss your observations.

plot(y=model6$res,x=model6$fit,type='h',main='Business Hires: Residual Plot',xlab='Fitted Hires, Thousands',ylab='Residuals, Thousands')

The residuals seems to be heteroskedastic with variance increasing for larger fitted values. All of the fitted values below 650,000 hires have quite large positive residuals. The large amount of variation in residuals for high amounts of fitted hires is likely indicative of our model innacurately characterizing some components of seasonality and hence under or over estimating large spikes in hiring rates.

(e) Plot the ACF and PACF of the respective residuals and interpret the plots.

autoplot(model6$residuals,main='Business Hires: Residual Plot',xlab='Year',ylab='Residuals, Thousands')

acf(model6$residuals)

pacf(model6$res)

As suspected, the residual plot shows evidence of residual seasonality, particularly at 6 month increments. Furthermore, based on the residual plot we see some unaccounted for behavior surrounding the time of the 2008 recession which our model likely did not account for based on its anomalous behavior. We only see one significant spike at 6 months in both PACF and ACF. We might consider including this lag for a future model.

(f) Plot the respective CUSUM and interpret the plot.

plot(efp(model6$res~1, type = "Rec-CUSUM"),main='Business Hires: Recursive CUSUM', xlab='Year')

Based off of the CUSUM, we see that our model sees some fluctuation, particularly at the time of the recession but never leaves the red significance bands, thus indicating our model fit holds.

(g) Plot the respective Recursive Residuals and interpret the plot.

y=recresid(model6$res~1)
plot(ts(y,start=2000,freq=12), pch=16,ylab="Recursive Residuals", main = "Business Hires: Recursive Residuals",xlab="Year")

Based off of our recursive residuals, we see that the residuals are fairly evenly distributed in terms of variance. However, they are generally negative during the recession, indicating the model overestimated hires during this time period. Furthermore, we still see evidence of unaccounted for seasonality in our data.

(h) For your model, discuss the associated diagnostic statistics.

accuracy(model6)
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.481038 63.25124 50.06809 -0.2234241 5.502051 0.5591305
##                     ACF1
## Training set 0.009899113

The Mean absolute percent error of our model is a very strong 5.5 percent (indicating that on average the model misses its target by 5.5% of the actual value). The mean absolute error indicates that this percentage equates to about 50,000 hires of inaccuracy for any given prediction. The mean error indicator tells us that the model is on average overestimating the level of hiring by 1,480 hires.

(i) Use your model to forecast 12-steps ahead. Your forecast should include the respective error bands.

newDummy=seasonaldummy(ts(seq(1:12),start=2019,freq=12))
plot(forecast(model6,h=1,xreg=newDummy), main='Business Hires: Forecasts from ARIMA(0,1,1)+Seasonal Dummies',
     xlab='Year', ylab='Hires:Thousands')

(j) Compare your forecast from (i) to the 12-steps ahead forecasts from ARIMA, Holt-Winters, and ETS models. Which model performs best in terms of MAPE?

#Creating our forecats for all 4 fitted models
our_fit = forecast(model6,h=12,xreg=newDummy)
arima_fit = forecast(auto.arima(Business),h=12)
holt_fit = forecast(HoltWinters(Business),h=12)
ets_fit = forecast(Business,h=12)

#plotting the three other forecasts
plot(arima_fit,main = "ARIMA Fitted Forecast", xlab='Year', ylab='Hires:Thousands')

plot(holt_fit,main = "HoltWinters Fitted Forecast", xlab='Year', ylab='Hires:Thousands')

plot(ets_fit,main = "ETS Fitted Forecast", xlab='Year', ylab='Hires:Thousands')

#comparing accuracy measures
accuracy(our_fit$mean,Forecast_Business)
##                ME    RMSE      MAE      MPE     MAPE        ACF1 Theil's U
## Test set 18.66088 52.0162 41.08207 1.430385 3.409015 -0.08728397 0.4253809
accuracy(arima_fit$mean,Forecast_Business)
##                ME     RMSE      MAE      MPE     MAPE       ACF1 Theil's U
## Test set 19.71267 50.16997 40.91024 1.547334 3.372143 -0.1314729 0.4387349
accuracy(holt_fit$mean,Forecast_Business)
##                ME     RMSE      MAE        MPE     MAPE       ACF1 Theil's U
## Test set -2.65377 48.91024 37.78425 -0.3993721 3.162269 -0.1292721 0.4099882
accuracy(ets_fit$mean,Forecast_Business)
##               ME     RMSE      MAE       MPE     MAPE        ACF1 Theil's U
## Test set 10.2775 46.11457 36.99824 0.7053211 3.089072 -0.07708117 0.3768051

Comparing the accuracy measures of our forecasts on the out of sample test for 12 month ahead forecasts, we see our model, ARIMA’s model, HoltWinter’s model, and ETS’s model have MAPE of 3.4, 3.37, 3.16, and 3.08 respectively. This indicates that the ETS model is slightly superior to the forecasts derived from the other models.

(k) Combine the four forecasts and comment on the MAPE from this forecasts vs, the individual ones.

#Using regression analysis on the model's fitted values to determine which forecasts to prioritize in the forecast combination
ours = our_fit$fitted
arimas=arima_fit$fitted
holts=holt_fit$fitted
etss=ets_fit$fitted
all_forecasts=cbind(ours,arimas,holts,etss)
actual_data=Business

#running the regression on actual values vs fitted for each model
optimal=lm(actual_data~ours+arimas+holts+etss)
summary(optimal)
## 
## Call:
## lm(formula = actual_data ~ ours + arimas + holts + etss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -172.040  -37.414    7.784   39.354  207.446 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  15.4160    30.8366   0.500    0.618
## ours         -0.3054     0.9952  -0.307    0.759
## arimas        0.2059     0.2036   1.012    0.313
## holts        -0.0749     0.1961  -0.382    0.703
## etss          1.1583     0.9900   1.170    0.243
## 
## Residual standard error: 59.84 on 200 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.8692, Adjusted R-squared:  0.8666 
## F-statistic: 332.2 on 4 and 200 DF,  p-value: < 2.2e-16
# The resulting weights are -0.305  0.206    -0.075    1.158

#Computing the combined forecast vector based on the forecast of each model times the weight of that forecast
combined = our_fit$mean[1:12]*-0.305+arima_fit$mean*0.206+holt_fit$mean*-.075+ets_fit$mean*1.158


#comparing the MAPE of the combined forecast
accuracy(combined,Forecast_Business)
##                ME     RMSE      MAE      MPE     MAPE        ACF1 Theil's U
## Test set 29.10296 52.85205 41.90554 2.329198 3.474463 -0.05851752 0.4633038

The combined model has a MAPE of 3.47 which actually underperforms any of the individual models. This result likely indicates to us that the regression method for combining these forecasts was not ideal for weight selection.

Data Set 2: Hires in Finance Activities

(a) Produce a time-series plot of your data including the respective ACF and PACF plots.

read.csv("Finance_hires.csv",skip=10,header=T)[4] %>% ts %>%
  as.numeric %>% ts(start=c(2000,12),freq=12) -> fh
autoplot(fh, main='Finance Hires', xlab='Year', ylab='Thousands')

ggAcf(fh)

ggPacf(fh)

(b) Fit a model that includes, trend, seasonality and cyclical components. Make sure to discuss your model in detail.

This model attempts to forecast the finance hires. The trend and cycle in our data is severely dominated by the business cycle which is rather stochastic. Hence we opt against fitting it against an arbitrary deterministic trend, be it polynomial or periodic, and we use an ARIMA model to fit the trend and cycle instead. Let’s see auto.arima’s choice based on different information criteria, using a large maximum order. We enforce the usage of seasonal dummies.

auto.arima(fh,seasonal=F,xreg=seasonaldummy(fh),ic='bic',max.p=12,
  max.q=12,max.P=5,max.Q=5,max.order = 24)
## Series: fh 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##           ar1      ma1       Jan      Feb       Mar      Apr      May      Jun
##       -0.1638  -0.5562  -24.2236  43.8219  -11.4935  -1.1898  38.0992  39.0219
## s.e.   0.0993   0.0839    4.1778   4.0803    4.2504   4.3285   4.3807   4.3981
##          Jul      Aug      Sep      Oct      Nov
##       46.523  43.7628  21.6768  14.1258  38.4074
## s.e.   4.383   4.3331   4.2581   4.0873   4.2490
## 
## sigma^2 estimated as 243.2:  log likelihood=-947.6
## AIC=1923.21   AICc=1925.17   BIC=1971.28
auto.arima(fh,seasonal=F,xreg=seasonaldummy(fh),ic='aic',max.p=12,
  max.q=12,max.P=5,max.Q=5,max.order = 24)
## Series: fh 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##           ar1      ma1       Jan      Feb       Mar      Apr      May      Jun
##       -0.1638  -0.5562  -24.2236  43.8219  -11.4935  -1.1898  38.0992  39.0219
## s.e.   0.0993   0.0839    4.1778   4.0803    4.2504   4.3285   4.3807   4.3981
##          Jul      Aug      Sep      Oct      Nov
##       46.523  43.7628  21.6768  14.1258  38.4074
## s.e.   4.383   4.3331   4.2581   4.0873   4.2490
## 
## sigma^2 estimated as 243.2:  log likelihood=-947.6
## AIC=1923.21   AICc=1925.17   BIC=1971.28
auto.arima(fh,seasonal=F,xreg=seasonaldummy(fh),ic='aicc',max.p=12,
  max.q=12,max.P=5,max.Q=5,max.order = 24)
## Series: fh 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##           ar1      ma1       Jan      Feb       Mar      Apr      May      Jun
##       -0.1638  -0.5562  -24.2236  43.8219  -11.4935  -1.1898  38.0992  39.0219
## s.e.   0.0993   0.0839    4.1778   4.0803    4.2504   4.3285   4.3807   4.3981
##          Jul      Aug      Sep      Oct      Nov
##       46.523  43.7628  21.6768  14.1258  38.4074
## s.e.   4.383   4.3331   4.2581   4.0873   4.2490
## 
## sigma^2 estimated as 243.2:  log likelihood=-947.6
## AIC=1923.21   AICc=1925.17   BIC=1971.28

The auto.arima function consistently chooses ARIMA(1,1,1) without drift. Now, we will cunstruct this model:

fh.tsc=Arima(fh,order=c(1,1,1),xreg=seasonaldummy(fh))
summary(fh.tsc)
## Series: fh 
## Regression with ARIMA(1,1,1) errors 
## 
## Coefficients:
##           ar1      ma1       Jan      Feb       Mar      Apr      May      Jun
##       -0.1638  -0.5562  -24.2236  43.8219  -11.4935  -1.1898  38.0992  39.0219
## s.e.   0.0993   0.0839    4.1778   4.0803    4.2504   4.3285   4.3807   4.3981
##          Jul      Aug      Sep      Oct      Nov
##       46.523  43.7628  21.6768  14.1258  38.4074
## s.e.   4.383   4.3331   4.2581   4.0873   4.2490
## 
## sigma^2 estimated as 243.2:  log likelihood=-947.6
## AIC=1923.21   AICc=1925.17   BIC=1971.28
## 
## Training set error measures:
##                      ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -0.6920531 15.1143 12.21112 -16.16076 38.84912 0.6832711
##                    ACF1
## Training set 0.01614959
autoplot(fh,main='Finance Hires, Fitted Values:\n ARIMA(1,1,1) errors + Seasonal Dummies')+
  autolayer(fh.tsc$fit,series='Fitted Values')+
  xlab('Year')+ylab('Thousands')+
  scale_color_manual(values=c('red'),
                     breaks=c('Fitted Values'))

The chosen model uses an \(ARMA(1,1)\) to model the stochastic cycle. The \(I(1)\) without drift indicates that no linear trend is visible in the data, and that the series is integrated. The seasonal variation is modeled with 11 seasonal dummies (December is omitted to avoid collinearity) as per the direction of this assignment.

Both arima coefficents are significant. The seasonal coefficients are mostly significant, as can be seen in the plot below.

seas.mid=fh.tsc$coef[3:13]
seas.err=(diag(fh.tsc$var.coef)**0.5)[3:13]
tcrit=2
seas.high=seas.mid+tcrit*seas.err
seas.low=seas.mid-tcrit*seas.err
plot(seas.mid,type='l',
     main='Financial Hires: Plot of Seasonal Coefficients',
     xlab='Month',ylab='Thousands',
     ylim=c(-200,200))
lines(seas.high,lty=2,col='red')
lines(seas.low,lty=2,col='red')

rm(list=c('tcrit','seas.low','seas.mid','seas.high','seas.err'))

The confidence band is chosen for \(t_{\mathrm{crit}}=2\). We see a hike up in hires in February, May through August, and November. THe hike up in summer reflects the influx of summer interns and other hiring programs.

(c) Plot the respective residuals vs. fitted values and discuss your observations.

plot(fh.tsc$fit,fh.tsc$res,type='h',
     main='Finance Hires: Residual Plot',xlab='Fitted Hires, Thousands',ylab='Residuals, Thousands')

The residuals seems to be heteroskedastic. Noteably, fitted values sometimes goes below 10, and have abnormally large residuals. This is because the arima model does not take into account the fact that hires needs to be a positive number.

The low variance at the high extreme of fitted values could be because of the saturation of available positions. At the low extreme, firms can not hire negative people, hence the variability in hires is capped.

(e) Plot the ACF and PACF of the respective residuals and interpret the plots.

autoplot(fh.tsc$res,
     main='Finance Hires: Residual Plot',xlab='Year',ylab='Residuals, Thousands')

ggAcf(fh.tsc$res)

ggPacf(fh.tsc$res)

In this residual plot, it seems there is still residual seasonality not accounted for by the seasonal dummies. In fact if we relax the restriction that we use seasonal dummies, auto.arima would choose to regress on stochastic seasonality. The reason we have residual seasonality left is because the seasonal dummies does not take into account the change in seasonality over the years.

(f) Plot the respective CUSUM and interpret the plot.

plot(efp(fh.tsc$res~1, type = "Rec-CUSUM"),main='Finance Hires: Recursive CUSUM', xlab='Year')

In 2008 there is a huge deflection in the recursive CUSUM plot, enough to breach the red significance band which is set up. This indicates that our model break downs severely during the recession. The large deflection is also the result of fit at the begining of the series. It set off an upward momentum for the CUSUM plot, which breaches the significance band eventually.

(g) Plot the respective Recursive Residuals and interpret the plot.

plot(time(fh),recresid(fh.tsc$res~1)[1:230], pch=3,
     xlab="Year",ylab="Recursive Residuals",
     main='Finance Hires: Recursive Residuals')

The recursive residuals is evenly distributed in general, yet we see that their mean is biased downwards during the resession and upwards in the begining of the series. This echoes the signals we see in the CUSUM plot: the model breaks down in the early 2000s and during the recession.

(h) For your model, discuss the associated diagnostic statistics.

accuracy(fh.tsc)
##                      ME    RMSE      MAE       MPE     MAPE      MASE
## Training set -0.6920531 15.1143 12.21112 -16.16076 38.84912 0.6832711
##                    ACF1
## Training set 0.01614959

The appropriate statistic to look at is the MAE and RMSE. The magnitude of the series fluctuates greatly, while the variance of the series does not vary that much. MAPE and MPE would greatly distort the weight placed on the error in periods of low hires versus high hires. The RMSE and MAE suggests that our model overall is off by around \(\pm30\) thousand hires.

(i) Use your model to forecast 12-steps ahead. Your forecast should include the respective error bands.

newDummy=seasonaldummy(ts(seq(1:12),start=2020+2/12,freq=12))
plot(forecast(fh.tsc,h=12,xreg=newDummy),
     main='Finance Hires: Forecasts from ARIMA(2,1,2)+Seasonal Dummies',
     xlab='Year', ylab='Thousands')

(j) Compare your forecast from (i) to the 12-steps ahead forecasts from ARIMA, Holt-Winters, and ETS models. Which model performs best in terms of MAPE?

fh.train=window(fh,end=c(2019,1))
fh.test=window(fh,start=c(2019,2))
fh.tsc=Arima(fh.train,order=c(1,1,1),xreg=seasonaldummy(fh.train))
fh.arima=auto.arima(fh.train)
fh.hw=HoltWinters(fh.train)
suppressWarnings(fh.ets<-ets(fh.train))
# calculating fh forecasts under 4 models
fh.tsc.f=forecast(fh.tsc,h=12,xreg=seasonaldummy(fh.test))
fh.arima.f=forecast(fh.arima,h=12)
fh.hw.f=forecast(fh.hw,h=12)
fh.ets.f=forecast(fh.ets,h=12)
autoplot(fh.tsc.f,
     main='Finance Hires: Forecasts from ARIMA(2,1,2)+Seasonal Dummies',
     xlab='Year', ylab='Thousands')

autoplot(fh.arima.f,
     main='Finance Hires: Forecasts from auto.arima, order=(0,1,0)',
     xlab='Year', ylab='Thousands')

autoplot(fh.hw.f,
     main='Finance Hires: Forecasts from Holt-Winters',
     xlab='Year', ylab='Thousands')

autoplot(fh.ets.f,
     main='Finance Hires: Forecasts from ETS',
     xlab='Year', ylab='Thousands')

We will calculate MAPE below.

# Calculating MAPE
fh.tsc.MAPE=100*mean(abs(fh.tsc.f[4]$mean-fh.test)/fh.test)
fh.arima.MAPE=100*mean(abs(fh.arima.f[4]$mean-fh.test)/fh.test)
fh.hw.MAPE=100*mean(abs(fh.hw.f[4]$mean-fh.test)/fh.test)
fh.ets.MAPE=100*mean(abs(fh.ets.f[2]$mean-fh.test)/fh.test)
print(fh.tsc.MAPE)
## [1] 22.1688
print(fh.arima.MAPE)
## [1] 22.93805
print(fh.hw.MAPE)
## [1] 25.84444
print(fh.ets.MAPE)
## [1] 22.90089

In terms of MAPE, our initial model in fact performs the best because it has the lowest MAPE.

(k) Combine the four forecasts and comment on the MAPE from this forecasts vs., the individual ones.

# combining the forecasts
fh.combine.f=lm(fh.test~fh.tsc.f[4]$mean+fh.arima.f[4]$mean+fh.hw.f[4]$mean+fh.ets.f[2]$mean)$fit
fh.combine.MAPE=100*mean(abs(fh.combine.f-fh.test)/fh.test)
print(fh.combine.MAPE)
## [1] 9.787779

The combined model has a significantly lower MAPE than any of the four models by themselves.

(l) Fit an appropriate VAR model using your two variables. Make sure to show the relevant plots and discuss your results from the fit.

fest=ts(fh[1:217],start=c(2000,12),freq=12)
ftest=ts(fh[218:229],start=2019,freq=12)
ggCcf(fest,Business)

From the cross correlation coefficient plot, there are some peaks in at high lags, but the structure of those peaks does not conform to a simplistic statistical process. Let’s see what VARselect tells us to choose for the lag order.

y_ts=ts.union(fest, Business)

VARselect(y_ts,lag.max=30)
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##     15     13     13     15 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 1.616002e+01 1.557736e+01 1.536932e+01 1.537162e+01 1.535486e+01
## HQ(n)  1.620203e+01 1.564737e+01 1.546734e+01 1.549764e+01 1.550889e+01
## SC(n)  1.626369e+01 1.575015e+01 1.561123e+01 1.568263e+01 1.573499e+01
## FPE(n) 1.042823e+07 5.823349e+06 4.729807e+06 4.741029e+06 4.662821e+06
##                   6            7            8            9           10
## AIC(n) 1.532254e+01 1.531239e+01 1.510614e+01 1.492430e+01 1.489982e+01
## HQ(n)  1.550458e+01 1.552243e+01 1.534419e+01 1.519035e+01 1.519387e+01
## SC(n)  1.577179e+01 1.583075e+01 1.569362e+01 1.558088e+01 1.562552e+01
## FPE(n) 4.515339e+06 4.470809e+06 3.638742e+06 3.034940e+06 2.963020e+06
##                  11           12           13           14           15
## AIC(n) 1.480959e+01 1.449952e+01 1.439456e+01 1.437629e+01 1.436294e+01
## HQ(n)  1.513166e+01 1.484958e+01 1.477263e+01 1.478237e+01 1.479702e+01
## SC(n)  1.560441e+01 1.536345e+01 1.532761e+01 1.537845e+01 1.543421e+01
## FPE(n) 2.709023e+06 1.988192e+06 1.791607e+06 1.760901e+06 1.739510e+06
##                  16           17           18           19           20
## AIC(n) 1.436410e+01 1.439068e+01 1.440570e+01 1.442025e+01 1.442342e+01
## HQ(n)  1.482619e+01 1.488078e+01 1.492379e+01 1.496635e+01 1.499753e+01
## SC(n)  1.550449e+01 1.560019e+01 1.568432e+01 1.576798e+01 1.584027e+01
## FPE(n) 1.743788e+06 1.793381e+06 1.823512e+06 1.853654e+06 1.863367e+06
##                  21           22           23           24           25
## AIC(n) 1.444512e+01 1.447502e+01 1.449434e+01 1.443008e+01 1.444382e+01
## HQ(n)  1.504723e+01 1.510514e+01 1.515247e+01 1.511621e+01 1.515795e+01
## SC(n)  1.593109e+01 1.603010e+01 1.611854e+01 1.612339e+01 1.620624e+01
## FPE(n) 1.908574e+06 1.971446e+06 2.015450e+06 1.895725e+06 1.928279e+06
##                  26           27           28           29           30
## AIC(n) 1.446939e+01 1.449202e+01 1.446783e+01 1.449552e+01 1.453268e+01
## HQ(n)  1.521153e+01 1.526217e+01 1.526598e+01 1.532168e+01 1.538684e+01
## SC(n)  1.630092e+01 1.639267e+01 1.643760e+01 1.653441e+01 1.664067e+01
## FPE(n) 1.985330e+06 2.038700e+06 1.998384e+06 2.063873e+06 2.152536e+06
y_model=VAR(y_ts,p=13)
summary(y_model)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: fest, Business 
## Deterministic variables: const 
## Sample size: 204 
## Log Likelihood: -1990.243 
## Roots of the characteristic polynomial:
## 0.9943 0.9943 0.9858 0.9858 0.9856 0.9856 0.9854 0.9854 0.9846 0.9846 0.9307 0.9307 0.9039 0.9039 0.8682 0.8682 0.8601 0.8601 0.8366 0.8366 0.8183 0.8183 0.7887 0.7887 0.7582 0.7582
## Call:
## VAR(y = y_ts, p = 13)
## 
## 
## Estimation results for equation fest: 
## ===================================== 
## fest = fest.l1 + Business.l1 + fest.l2 + Business.l2 + fest.l3 + Business.l3 + fest.l4 + Business.l4 + fest.l5 + Business.l5 + fest.l6 + Business.l6 + fest.l7 + Business.l7 + fest.l8 + Business.l8 + fest.l9 + Business.l9 + fest.l10 + Business.l10 + fest.l11 + Business.l11 + fest.l12 + Business.l12 + fest.l13 + Business.l13 + const 
## 
##               Estimate Std. Error t value Pr(>|t|)    
## fest.l1       0.237595   0.073661   3.226   0.0015 ** 
## Business.l1   0.080625   0.016647   4.843 2.77e-06 ***
## fest.l2       0.175518   0.069744   2.517   0.0127 *  
## Business.l2   0.002349   0.017900   0.131   0.8957    
## fest.l3       0.095653   0.071188   1.344   0.1808    
## Business.l3  -0.020038   0.017991  -1.114   0.2669    
## fest.l4       0.117046   0.070847   1.652   0.1003    
## Business.l4   0.018790   0.018187   1.033   0.3029    
## fest.l5       0.051612   0.071235   0.725   0.4697    
## Business.l5  -0.034793   0.018187  -1.913   0.0574 .  
## fest.l6      -0.049879   0.072276  -0.690   0.4910    
## Business.l6  -0.025828   0.018394  -1.404   0.1620    
## fest.l7       0.070752   0.072067   0.982   0.3276    
## Business.l7  -0.041015   0.018349  -2.235   0.0267 *  
## fest.l8       0.025139   0.071310   0.353   0.7249    
## Business.l8  -0.008318   0.018578  -0.448   0.6549    
## fest.l9       0.003809   0.070506   0.054   0.9570    
## Business.l9   0.011804   0.018158   0.650   0.5165    
## fest.l10      0.023996   0.070115   0.342   0.7326    
## Business.l10  0.044407   0.018458   2.406   0.0172 *  
## fest.l11     -0.016453   0.069417  -0.237   0.8129    
## Business.l11 -0.036610   0.018761  -1.951   0.0526 .  
## fest.l12      0.294908   0.071261   4.138 5.40e-05 ***
## Business.l12 -0.018345   0.017719  -1.035   0.3019    
## fest.l13     -0.106166   0.067294  -1.578   0.1164    
## Business.l13  0.040382   0.017360   2.326   0.0211 *  
## const        -8.249617   9.626455  -0.857   0.3926    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 16.45 on 177 degrees of freedom
## Multiple R-Squared: 0.791,   Adjusted R-squared: 0.7603 
## F-statistic: 25.76 on 26 and 177 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Business: 
## ========================================= 
## Business = fest.l1 + Business.l1 + fest.l2 + Business.l2 + fest.l3 + Business.l3 + fest.l4 + Business.l4 + fest.l5 + Business.l5 + fest.l6 + Business.l6 + fest.l7 + Business.l7 + fest.l8 + Business.l8 + fest.l9 + Business.l9 + fest.l10 + Business.l10 + fest.l11 + Business.l11 + fest.l12 + Business.l12 + fest.l13 + Business.l13 + const 
## 
##               Estimate Std. Error t value Pr(>|t|)    
## fest.l1       0.271411   0.317280   0.855  0.39347    
## Business.l1   0.404002   0.071702   5.634 6.81e-08 ***
## fest.l2       0.540769   0.300408   1.800  0.07355 .  
## Business.l2   0.227323   0.077099   2.948  0.00362 ** 
## fest.l3       0.004833   0.306628   0.016  0.98744    
## Business.l3   0.124677   0.077491   1.609  0.10942    
## fest.l4      -0.332092   0.305161  -1.088  0.27796    
## Business.l4  -0.064452   0.078338  -0.823  0.41176    
## fest.l5      -0.687016   0.306833  -2.239  0.02640 *  
## Business.l5   0.044767   0.078337   0.571  0.56840    
## fest.l6      -0.366612   0.311315  -1.178  0.24053    
## Business.l6   0.017399   0.079228   0.220  0.82643    
## fest.l7       0.135934   0.310414   0.438  0.66198    
## Business.l7   0.014863   0.079035   0.188  0.85105    
## fest.l8       0.161316   0.307154   0.525  0.60010    
## Business.l8   0.009726   0.080021   0.122  0.90340    
## fest.l9       0.436627   0.303689   1.438  0.15227    
## Business.l9   0.192540   0.078214   2.462  0.01479 *  
## fest.l10     -0.065270   0.302008  -0.216  0.82914    
## Business.l10 -0.043207   0.079504  -0.543  0.58750    
## fest.l11      1.248477   0.299001   4.175 4.66e-05 ***
## Business.l11 -0.054454   0.080810  -0.674  0.50128    
## fest.l12     -0.748967   0.306945  -2.440  0.01567 *  
## Business.l12  0.282242   0.076323   3.698  0.00029 ***
## fest.l13     -0.924387   0.289855  -3.189  0.00169 ** 
## Business.l13 -0.195745   0.074776  -2.618  0.00962 ** 
## const        64.919788  41.464107   1.566  0.11921    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 70.86 on 177 degrees of freedom
## Multiple R-Squared: 0.8346,  Adjusted R-squared: 0.8103 
## F-statistic: 34.36 on 26 and 177 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##            fest Business
## fest     270.68    53.36
## Business  53.36  5021.83
## 
## Correlation matrix of residuals:
##             fest Business
## fest     1.00000  0.04577
## Business 0.04577  1.00000

We choose a model of VAR(13) based on BIC. This length of lag will allow the model to account for seasonality as well as any impulses from the other variable over a broad timeframe. The fitted coefficients indicate that business hiring and finance hiring are loosely related to each other, as can be seen in the cross lag coefficients being significant in some instances. The most predictive power the model gains is from lags between 11 and 13 months in the past for both finance and business hiring.

(m) Compute, plot, and interpret the respective impulse response functions.

irf(y_model)
## 
## Impulse response coefficients
## $fest
##             fest  Business
##  [1,] 16.4522471  3.243483
##  [2,]  4.1704848  5.775693
##  [3,]  4.3518453 13.099492
##  [4,]  4.3444085 10.525485
##  [5,]  4.9452403  5.830149
##  [6,]  3.9190022 -2.821163
##  [7,]  1.6554059 -5.614533
##  [8,]  2.1269880 -3.258561
##  [9,]  1.3871709 -4.013817
## [10,]  0.5403916  2.129876
## [11,]  1.2933437 -0.973847
## 
## $Business
##            fest  Business
##  [1,]  0.000000 70.790630
##  [2,]  5.707527 28.599544
##  [3,]  3.828215 29.195695
##  [4,]  2.913968 31.247908
##  [5,]  5.255258 21.152799
##  [6,]  2.062489 21.740187
##  [7,]  1.335324 16.341063
##  [8,] -1.690171 10.836851
##  [9,] -1.720744  8.358957
## [10,] -1.158446 17.012261
## [11,]  1.720982 12.074184
## 
## 
## Lower Band, CI= 0.95 
## $fest
##             fest   Business
##  [1,] 13.6640982  -4.716981
##  [2,]  1.5444911  -4.257706
##  [3,]  1.6781554   2.184920
##  [4,]  1.3484645  -3.738510
##  [5,]  1.5605819  -6.636468
##  [6,]  0.6777998 -14.710626
##  [7,] -1.6025828 -18.746790
##  [8,] -0.8863596 -14.489489
##  [9,] -1.4304033 -16.150044
## [10,] -2.2835527 -13.540669
## [11,] -1.3660996 -15.469294
## 
## $Business
##             fest  Business
##  [1,]  0.0000000 57.793986
##  [2,]  3.0066119 15.133700
##  [3,]  0.9988691 15.481743
##  [4,]  0.5383600 15.619104
##  [5,]  2.0179275  6.867773
##  [6,] -0.6476211  6.747886
##  [7,] -1.7844307  1.135875
##  [8,] -4.0361656 -1.974966
##  [9,] -3.8872413 -5.037094
## [10,] -3.9831070  2.405289
## [11,] -1.1339383 -1.079471
## 
## 
## Upper Band, CI= 0.95 
## $fest
##            fest  Business
##  [1,] 16.899430 10.764525
##  [2,]  5.873509 13.312782
##  [3,]  6.014002 22.561786
##  [4,]  6.200381 21.671982
##  [5,]  6.894514 15.943608
##  [6,]  6.097185  7.253293
##  [7,]  3.966258  8.952144
##  [8,]  4.451224  6.143871
##  [9,]  3.689069  7.383395
## [10,]  2.019517 11.375126
## [11,]  3.236086  9.817675
## 
## $Business
##            fest Business
##  [1,] 0.0000000 71.33841
##  [2,] 7.3101655 34.01752
##  [3,] 5.3028029 35.78225
##  [4,] 4.3565870 37.86974
##  [5,] 6.8066653 27.39095
##  [6,] 3.6060410 28.84413
##  [7,] 3.8138737 25.79419
##  [8,] 0.4699005 16.89283
##  [9,] 0.2170091 16.30716
## [10,] 0.6750872 25.25491
## [11,] 3.7414763 20.63112
plot(irf(y_model, n.ahead=50))

We can see that both series have autoregressive behavior. Business hires in particular has very strong persistence. However, the cross impulse response functions reflects that the series are primarily causally related on a seasonal basis, as the significance band almost always covers the zero line before jumping upwards at yearly increments. This indicates that our initial hypotheses of relation between these two variables is likely incorrect, outside of the influence of shared seasonal behavior. We test this further with granger causality below.

(n) Perform a Granger-Causality test on your variables and discuss your results from the test.

grangertest(fest~Business,order=20)
grangertest(Business~fest,order=20)
grangertest(fest~Business,order=4)
grangertest(Business~fest,order=4)

If we look at a broad enough time frame, we see that the two variables do seem to granger cause each other. However, because we know both data have significant seasonality, this apparent causality is likely a result of shared recruitment seasons rather than actual relations in long term behavior.

(o) Use your VAR model to forecast 12-steps ahead. Your forecast should include the respective error bands. Comment on the differences between the VAR forecast and the other ones obtained using the different methods.

forecastVar = forecast(y_model,h=12)
plot(forecastVar,main = "1 Year Forecasts from VAR(13) for Finance (top) and Business (bottom)")

We compare our forecast to those that came previously with MAPE below.

accuracy(forecastVar$forecast$fest$mean,ftest)
##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 7.047786 14.34272 11.61987 3.905513 16.78461 0.3387653  0.159968
accuracy(forecastVar$forecast$Business$mean,Forecast_Business)
##                ME     RMSE      MAE      MPE     MAPE       ACF1 Theil's U
## Test set 15.43331 52.52061 40.74248 1.134856 3.369358 0.06329162  0.464293

The value of 3.39 MAPE for business is on par with our best forecasts for individual models. The value of 16.78 MAPE for finance is superior to all of the models that were fitted previously. Thus, this VAR model does have strong predictive accuracy.

(p) Fit a GARCH model to the residuals from your favorite model, and produce a new 12-steps ahead forecast, including one for the variance.

We will use the residuals from the Business Hires Model, ARIMA(0,1,1) + Seasonal Dummies, as input for the GARCH model

Business.garch.input=model6$res
tsdisplay(Business.garch.input**2,main='Finance Hires: Squared Residuals')

The ACF and PACF of the estimated residuals are not indicative of the order of GARCH to use. We will try to fit a a large garch (12,12) model in order to see if variance comes as a result of seasonality along with with arma(2,2) to account for stochastic behavior in the residuals. That being said, it does not appear based off of acf and pacf that this is a GARCH process.

# Specify an ARMA + GARCH Model
garchmod=ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(12, 12)), mean.model = list(armaOrder = c(2, 2), include.mean = TRUE),
distribution.model = "norm")

Business.garch= ugarchfit(spec=garchmod,data=Business.garch.input)

#plot the garch forecast
modelfor = ugarchforecast(Business.garch, data = NULL, n.ahead = 12, n.roll = 0, out.sample = 0)
plot(modelfor,main="GARCH(12,12) Forecast", which = 1)

Here we have our garch forecast which we will combine with our model’s forecast

#We derive the following estimates of variance from the garch model.
variance= c(55.21, -1.1825 ,6.4173 ,-0.5173 ,4.0107 ,2.9488 ,0.4105 ,5.7601 ,-0.9265 ,5.3400 ,1.0744 ,2.3306)

newforecast = our_fit$mean + variance

plot(Actual_Business,col="grey",main="Forecast including GARCH variance",ylab="Hires (thousands")
lines(ts(newforecast,start=2019,freq=12),col ="blue")

accuracy(newforecast,Forecast_Business)
##               ME     RMSE      MAE       MPE     MAPE       ACF1 Theil's U
## Test set 11.9212 60.73033 45.31396 0.8478915 3.775467 -0.1066129 0.4268198

Above is our plot combining variance forecast (GARCH) and point forecast from our model6 (ARIMA + seasonal dummies). From the MAPE we don’t see significant improvement in this models performance due to including GARCH estimates for variance. The MAPE of 3.77 actually is a decrease relative to other models.

III. (5%) Conclusions and Future Work.

For professional and business services hires, we had fit an ARIMA(0,1,1)+Seasonal Dummies model. The seasonal coefficients reflect that the hiring peak occurs in April. Because of the dominance of a stochastic cycle, the seasonal dummies inaccurately characterize the seasonality. For Financial Activity hires, we had fit an ARIMA(1,1,1)+Seasonal dummies model to the series. The seasonal coefficients reflect that the hiring peak occurs in February and summer. Again we see that the seasonal dummies cannot capture the varying seasonality of the hiring. Our models for both series tends to break down during the recession, as indicated by the recursive residuals and CUSUM plots.
The result of a VAR model which tries to forecast both series simultaneously shows that both series are strongly affected by seasonality first and foremost. The residual seasonal fluctuations can be seen to show up in the Impulse Response functions. This suggests for a us a direction to work towards: a better model should be able to separate the seasonality from the VAR process, so the VAR assumptions about the errors hold. Furthermore, business hires showed more persistence in its own impulse response than finance hires. The reason is unclear to us and should be a future direction of investigation. We have not found very strong causality between the two hiring series in our present procedure, which is expected for a preliminary analysis of the data. A far more promising direction would be to perform the same analysis over all pairs of such hiring series to tease out those which can be accurately described with our model.

IV. (5%) References (include the source of your data and any other resources).

[1] MSFT Inc. (AAPL). Yahoo Finace-Historic Data. https://finance.yahoo.com/quote/AAPL/history?p=AAPL. Accessed Feb 28, 2020.

[2] Alphabet Inc. (GOOG). Yahoo Finace-Historic Data. https://finance.yahoo.com/quote/GOOG/history?p=GOOG. Accessed Feb 28, 2020.

V. (5%) R Source code. Although the code is only worth 5%, if you do not submit your code, you will not receive credit for the assignment.

© 2020 GitHub, Inc.